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Abstract 

This paper presents a new method, referred to here as the sparsity invariant transformation based t\ minimization, 
to solve the lo minimization problem for an over-determined linear system corrupted by additive sparse errors with 
arbitrary intensity. Many previous works have shown that l\ minimization can be applied to realize sparse error 
detection in many over-determined linear systems. However, performance of this approach is strongly dependent on 
the structure of the measurement matrix, which limits application possibility in practical problems. Here, we present 
a new approach based on transforming the £q minimization problem by a linear transformation that keeps sparsest 
solutions invariant. We call such a property a sparsity invariant property (SIP), and a linear transformation with SIP is 
referred to as a sparsity invariant transformation (SIT). We propose the SIT-based I\ minimization method by using 
an SIT in conjunction with £\ relaxation on the £o minimization problem. We prove that for any over-determined 
linear system, there always exists a specific class of SIT’s that guarantees a solution to the SIT-based l\ minimization 
is a sparsest-errors solution. Besides, a randomized algorithm based on Monte Carlo simulation is proposed to search 
for a feasible SIT. 


Index Terms 

Sparsest error detection, sparsest recovery, sparsity invariant transformation, SIT-fi minimization. 

I. Introduction 

This paper presents a new approach to transform the exact sparsest error detection problem for a linear system 
corrupted by additive errors with arbitrary intensity. Let y £ R" denote a measurement vector, A £ R n x r a 
measurement matrix, x £ R r a hidden signal, and e a noise vector. Then a noisy linear measurement system can 
be represented by the equation 

y = Ax + e. (1) 

Depending on the application, many such models have been proposed and analyzed Here, we consider e as 

an error vector and focus on an over-determined measurement system corrupted by sparse errors. In other words, 
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we assume n > r, most entries of e are zeros, and our goal is to detect the sparsest errors and thus determine a 
sparsest-errors solution. The sparsest error detection problem can be formulated as the 4 minimization problem as 
below: 


(4) min||y- Ax\\ 0 . 

X 

In the rest of the paper, we also refer to this 4 minimization problem as Problem (4). Define x* as a solution to 
this problem and let e* = y — Ax* denote the corresponding sparsest error vector. For simplicity, and without loss 
of generality, we assume the rank of matrix A is equal to r, the column number of A. 

Usually, problem (4) is hard to solve due to its non-convexity and combinatorial complexity in high dimensions. 
A commonly accepted solution is to apply 4 relaxation to obtain an approximate solution to Problem (4)- The 
4-relaxed problem can be written as: 

(4) min||y- Ax\\ x . 

X 

We refer to this 4 minimization problem as Problem (4) or direct 4 minimization problem in the rest of the paper. 
Define x as a solution to Problem (4) and let e = y — Ax denote the corresponding 4- minimal error vector. 

Over the years, researchers have observed empirically that 4 minimization, arising for example from the basis 
pursuit problem 0J, or from the application of the 4 -norm penalty in model selection problem 15), tends to produce 
solutions containing many zero entries. Subsequently, it was proved in (6) that if a signal has a sparse decomposition 
under a specially-structured matrix, then the sparse decomposition can be achieved by 4 minimization. More 
recently, various sufficiency conditions ( 2 ), 0-og on the measurement matrix have been established to guarantee 
exact sparsity recovery via 4 minimization. Roughly speaking, these are conditions requiring a measurement matrix 
to be either incoherent or near orthonormal. From this, we see that the performance of 4 minimization in finding 
the sparsest solution relies on the structure of the measurement matrix. 

To remove this reliance, this paper provides a new methodology to ensure solution equivalence between 4 and 
4 minimization after imposing a linear transformation on the measurement vector space. The linear transformation 
is required to preserve the sparsity property of an original solution, and thus, is referred to as a sparsity invariant 
transformation (SIT). A major theoretical contribution of this paper is to show that such an SIT exists. As a result, 
the proposed methodology allows the application of 4 minimization to a general, ’’condition-free”, measurement 
matrix for sparsest error detection or sparsest signal recovery. 

For simplicity, we derive the theoretical results based on the sparsest error detection model for an over-determined 
system. Since the task of finding a sparsest solution to an under-determined system can be equivalently converted into 
a sparsest error detection problem in an over-determined system |2j, our results also apply to an under-determined 
system for sparsest recovery. Even though we have proven the existence of an SIT, there is no known algorithm 
for constructing an SIT in general. Instead, we propose a heuristic randomized algorithm based on Monte Carlo 
simulation for its construction. Numerical results in section 6 demonstrate that the proposed methodology is effective 
in sparsest error detection, especially in cases where direct 4 minimization fails to detect the sparsest errors. 
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The rest of the paper is organized as follows: In section 2 we formally introduce the sparsity invariant transfor¬ 
mation <t>. In section 3 we prove the existence of $ for a general error detection problem. Section 4 describes how 
the proposed methodology can be applied to the sparse representation problem. Section 5 presents a randomized 
algorithm to construct <1>. Section 6 highlights some numerical results to demonstrate the advantages of the proposed 
methodology. There is evidence to show the new approach works for cases where direct i\ minimization performs 
poorly. 

Notation 

We use span(A) to denote the subspace spanned by columns of A. Let span([A, y}) 1 - denote the orthogonal 
complementary subspace of span([Ayy\). We use I r £ R rxr to denote an r x r identity matrix. Let [?r] = 
{1, 2,...., n}. Let S C [n] (resp. S C [m]), denote by A(<S, •) (resp. A(-,<S ) ) a sub-matrix formed by rows (resp. 
columns) of matrix A indexed by elements in S. If S = {?'}, then A(i, •) (resp. A {•, i) ) denotes the i-th row (resp. 
i-th column) of matrix A. If y is a vector, let y(S) denote a sub-vector formed by entries of y indexed by elements 
S. If S = {(}, then y{i ) denotes the i-th entry of y. Let supp{e ) denote the support set of e. 

II. Sparsity Invariant Transformation 

In general, a solution to Problem (t \) may not be a solution to Problem (£q). Instead of imposing conditions on 
the measurement matrix to guarantee an equivalence between the two solutions as many researchers have done in 
compressed sensing, we set out to create an equivalence by transforming the original linear system by means of an 
invertible linear transformation. The invertible condition is required in order to avoid losing any useful information 
during the transformation. Assume $ is an invertible matrix, then the transformed £ 0 minimization problem can be 
written as: 


0 SIT 0 ) min||T>j/- $Ax|| 0 . 

X 

Define x% as a solution to Problem ( SITq ) and let e% = <t>?y — <1>,4.denote the corresponding sparsest error vector. 
To make the transformation meaningful, we need to find a specific <I> to make tfi, and e* equivalent in a sense. 
Formally stated, we require the linear transformation satisfies the following sparsity invariant property (SIP). 
Definition: $ is an SIP transformation if it is invertible and there exist non-zero scaling factors A and p such that 
Ae|, is a sparsest error vector to Problem ((?o) and pe* is a sparsest error vector to Problem (SITq). (Noted that if 
Problem (fio) has a unique solution then p is equal to j.) 

In the rest of the paper, we refer to Problem (SITq) as the SIT-fo minimization problem and $ is an SIT if it 
is an SIP transformation. 

It follows from the SIP definition that ignoring differences in magnitude, an SIT preserves the sparsest error vector 
of a linear system but it may change the (4-minimal error vector. If the sparsest error vector and the l-\ - minimal 
vector do not coincide in the original system, i.e., y = Ax + e, then we will show that an SIT can be applied to 
transform the original system into an equivalent representation in which the above two error vectors coincide. 
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To continue the analysis, we need to find matrices with the sparsity invariant property. To this end, define a 
family of matrices, TZ, by 

1Z = {$ £ R nxn | $(span([A,y]) = span([A, y }), $/ = /, 

V/ e span^Apy})^}. 

Since any element $ in TZ has zero null-space, $ is invertible. The following lemma 1 shows that if a matrix 
belongs to TZ, then its inverse matrix also belongs to it. 

Lemma 1 If <1> belongs to TZ, then <1> “ 1 , the inverse of >1>, also belongs to TZ. 

The proof is simple since $ £ TZ, then we have 

for any f £ span{[A,y})^. Thus lemma 1 holds. ■ 

The next proposition proves that TZ defines a family of matrices with SIP. 

Proposition 1 If $ £ TZ then $ satisfies the sparsity invariant property. 

Proof Using notation introduced previously, let e* be a sparsest error vector to Problem (f 0 ) and e|, be a sparsest 
error vector to Problem ( SITq ). Let ei = < f>~ 1 e*. Since e* £ span([A,y]) and $ _1 £ TZ according to lemma 1, it 
follows that e\ £ span ([A. y]). Therefore, there exists a non-zero real number A such that 

y — Aei £ span(A). 

Denote y — Aei = Axi, we have 

e* = <f> ei = - Axi), (2) 

which implies 

l|elo>H4Uo- (3) 

On the other hand, let e 2 = 'be 4 , £ span(\<l> A, Ty]), then there exist a non-zero real number /1 and a vector X 2 
such that 

<I>y — ye2 = <&Ax 2 . 


From this, it follows that 


which implies 


e% = $ 1 e 2 = -{y- Ax 2 ), 


l4llo>||elo. 


According to (|3]) and ([ 5 ]), we infer that 

H4iio = nio. 


(4) 

(5) 

( 6 ) 


Based on (J2| and (|(T[), we know that Ae* is a sparsest error vector to Problem (S'/To). Based on Q and (Joj), we 
know that pe $ is the sparsest error vector to Problem (£ 0 ). Thus proposition 1 holds. ■ 
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III. Exact Sparsest Recovery 

In the previous section, we present the sparsity invariant property (SIP) for a linear transformation <!', which 
preserves the sparsest error vector after transformation. In this section, we aim to prove that for any ( 0 minimization 
problem, an SIT always exists. This ensures that a sparsest error detection problem can be equivalently converted 
into an SIT-based i\ minimization problem defined by Problem ( SIT \): 

( SIT \) min ||$y — <I>Ar||i. 

X 

Define x,p as a solution to Problem ( SITp ) and let e$ be the corresponding t\ -minimal error vector. We also refer 
to this problem as an SIT-f'i minimization problem in the rest of the paper. 

Before we prove the existence result, we introduce an example to demonstrate the essential concept of SIT-fi 
minimization in sparsest error detection. 

A. Example where direct i\ minimization fails 

Researchers often use l\ norm to relax £q norm in order to avoid NP-hard computational complexity. Unfortu¬ 
nately, in many cases direct l\ relaxation fails to detect the sparsest errors. The following example presents one of 
the failed cases. Define A and y by 

A = [—1, 1, —10] T and y = [—1, 1, 0] T . 

Then the corresponding sparsest error vector is e* = [0, 0, 10] J and the l \-minimal error vector is e = [—1, 1, 0] T . 
Obviously, the two error vectors are not the same and they have different support sets. 

To take the SIT-/ | minimization approach we define a transformation $ by 


0.5000 

0.5000 

0.7071 

0.5000 

0.5000 

-0.7071 

-0.7071 

0.7071 

0 


One can verify that span([A, t/]) _L = {Az|A £ R} where vector z is set as [1, 1, 0] T , and 4>z = z for the defined 
<!>. This shows that $ belongs to 1Z and hence it is an SIT. Moreover, 

% = [0, 0, 1.4142] t and $A = [7.0711, -7.0711, 1.4142] T . 

And the sparsest error vector after transformation is ef, = [0, 0, 1.4142] T . Obviously e$(= 0.14142e*) is equal 
to the original sparsest error vector except the magnitude. The fi-minimal error vector after transformation is 
e,p = [0, 0, 1.4142] t which is equal to ef. This example presents a case where SIT-/) minimization outperforms 
direct ip minimization in sparsity seeking. 

Ignoring differences in magnitude, an SIT can be geometrically understood as a rotation on the measurement 
vector space around the ’’axis” defined by span([A, y]) 1 - to make the ^-minimal an d f'o-minimal errors coincide. 
The top figure in Fig.l shows a failed case of directly applying t\ minimization for sparsest detection whereas the 
bottom figure in Fig.l presents the scenario that the (|-minimal error vector coincides with the Ti-minimal error 
vector after applying an SIT. 
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Fig. 1. A denotes a measurement matrix, here it is a column vector, y is the measured data, Phi A and Phi y denote ( h A and d'y respectively, 
"axis” U c denotes span([A, y]) . and the blue ball denotes an t: i ball. The red point denotes the t \-minimal error vector and the blue point 
denotes the (.q -minimal error vector. The red point and the blue point are two different points before transformation, but they coincide after 
transformation. 


B. Mathematical Verification 

In this subsection, we present a proof of the existence of an SIT that can guarantee the detection of the sparsest 
errors for an error detection problem. To facilitate subsequent discussions, we first introduce a family of polytopes, 
T(t), defined as below: 

P 

f 


T (t) = {a\a= [A,y] 


p£R r , f£R> 0, ||ck|| i < t}. 
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From the definition of F(i), one can infer the polytopes defined by V(t ) have the same shape and their scales vary 
with parameter t,. 

Assume a* is one of the extreme points of F(£), according to the separation theorem there exists a specific vector 
a £ span{[A,y ]) that guarantees: 

a T (a-a*) < 0, Va ± oc\a £ T(t). (7) 

To better understand SIT, we first show in lemma 2 that there are finitely many sparsest error vectors to Problem 

(4). 

Lemma 2 The number of the sparsest error vector e* to Problem (£ 0 ) is no more than 
Proof Since we can always find a row index set S £ [n] with rank(A{S , •)) = r such that 

(:V + e)(S) =A(S,-)x 

no matter what the real error e is. Given such an S, both x and the corresponding error vector are uniquely 
determined. There are at most ( ) such S. Therefore, the number of the sparsest error vectors is no more than 

f n \ W 

( j, thus lemma 2 holds. ■ 

Based on lemma 2, we arrive at the following conclusion in lemma 3. 

Lemma 3 If the sparsest error vector e* to Problem (To) is not unique, then different sparsest error vector has 
different support set. 

Proof Suppose e\ and <>> are two different sparsest error vectors that share the same support set. Assume 

e 1 =y- Ax i 
e >2 = y ~ Ax 2 - 

Then we have 

te i + (1 - t)e 2 = y~ A[tx i + (1 - t)x 2 \ 

for all real number t. Since t can be an arbitrary real number, therefore we have infinitely many sparsest error 
vectors. This conclusion contradicts lemma 2, therefore the supposition is wrong, thus lemma 3 holds. ■ 

The following lemma characterizes another important feature of the sparsest error vectors to Problem (To). 
Lemma 4 If e* is one of the sparsest error vectors to Problem then it is an extreme point of F(||e* || i). 
Proof If e* is not an extreme point, there exist two points pi and p 2 such that 

e* = tpi + (1 - t)p 2 , 

for some t £ (0,1). Therefore, we have 

l|e*||i = \\tpi + (1 - t)p 2 ||i < t|bi||i + (1 - f)||p 2 ||i• (8) 

Since e* lies on the boundary of F(||e*||i), we can replace the inequality in ([8]) with an equality. On the other 
hand, the equality in (|8j) holds if and only if the following two conditions are satisfied: 

||e*||i = llpiHi = llpalli, 
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and the non-zero entries of pi and p 2 share the same signs. The latter condition implies supp(e*) = supp(p\) U 
supp(p 2 ). Since e* is a sparsest point of T(i), we must have supp(e*) = supp(pi) = suppfa). From the definition 
of r(||e* ||i), one can infer that with proper scaling parameters, denoted by Ai and A 2 , Aipi and X 2 P 2 are actually 
two sparsest error vectors to Problem (£ 0 ). According to lemma 3, different sparsest error vector has different 
support set. Therefore, Aipi, X 2 P 2 and e* must be the same sparsest error vector, that is, 

e* =Pi = P 2 - 


This result implies e* is an extreme point of T(||e* || 1 ). Thus lemma 4 holds. ■ 

According to lemma 4, we observe that is one of the extreme points of polytope r(||e||i_), where e is a 

sparsest error vector to Problem (l\). Obviously, there exists a vector a with the property: 


T( Holt *\ , w / ll^lll * ._ ■p/'ll-ii \ 

a (« - 11 e ) < 0, Va ^ .. ... e , a G T(||e||i). 


(9) 


l e Hi ll e Hi 

Condition (|9| motivated us to establish the following proposition below, which is essential to the proposed method¬ 
ology. 

Proposition 2 If $ G 1Z is orthogonal and satisfies the equation 


•Pitr+I = a, 


( 10 ) 


with a defined in ([9| and u r +± being a normalized l 2 vector in span([A,y]), orthogonal to span(A), then 


6<f> , 

where e|> and e$ are an sparsest error vector and an t \-minimal error vector after transformation respectively. 

It should be noted that since both a and u r+ 1 belong to span ({A. y\), there exist transformations to establish 


Proof: According to proposition 1, we have 


* _ ll e Hi * 

^ 11 11 7 




for any <£> E 1Z. By substituting (11) into we get 


Setting 0 = we get 


T/ ll°l|l * \ . n W „ -L N°lli ^ T^/II -II \ 


a T (0 — e%) < 0, VMel.flGrOleSlli). 


Observe that e$ G r(||ej||i). Suppose e$ / ej we have 

a T (e<i> - ej) < 0. 


Combining (10 1 with (13 1 we obtain the inequality 

( < f>u T .+i) r (e$ - e%) < 0. 


( 11 ) 


( 12 ) 


(13) 


(14) 
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On the other hand, there exist vectors Axi and Ax 2 such that 

e$ = — $Ax 1, 


It follows from 


e$ = - $Ae 2 . 


u^ +1 A(a;i - x 2 ) = 0, 


that 

(<f>u T . + i) r (e$ - e%) = 0, 

which contradicts the result in m So we have 


e$ = e$. 

Thus proposition 2 holds. ■ 

Proposition 2 shows that for any £q minimization problem, there exists an SIT which enables the detection of 
the sparsest errors by means of SIT-/] minimization of a corresponding problem. In the following discussion, we 
reveal another nice property of SIT-/] minimization. 


Corollary 1 If $ is an orthogonal matrix in 1Z that satisfies (10 1 , then the corresponding SIT-// minimization 
problem has a unique solution. 

Proof Suppose the corresponding SIT-/] problem does not have a unique solution. Given there are two distinct 
solutions to the SIT-/: [ problem there will be two distinct Z] -minimal error vectors, ei and e 2 , since A is assumed 
to have full column rank. For all t in (0,1), define an error vector e t by 


e t = te 1 + (1 - f)e 2 . 


It follows that 


||e t ||i<f||e 1 || 1 + (l-f)||e 2 || 1 . 

Since both e\ and e 2 are l \ - minimal error vectors, we have 


Mi = INU = Mr, 

which implies that et is also an (i \-minimal error vector to the SIT -/1 minimization problem. According to proposition 
1 and 2, we know that for any t £ (0. /), there exists a scaling factor. A, that can scale Xe t to become a sparsest 
error vector to Problem do). However, this conclusion contradicts lemma 2, which states that the number of sparsest 
error vectors to Problem (f 0 ) is finite. By contradiction, it follows that corollary 1 holds. ■ 

This corollary implies that the uniqueness of the l \-minimal error vector is guaranteed in the proposed SIT-based 
methodology. 
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IV. Sparsest Solutions for an Under-determined System by SIT-fy Minimization 
In this section, we consider finding sparsest solutions to an under-determined system. 


V = Fe, 


( 15 ) 


where y denotes measured data, F denotes an m x n measurement matrix with m < n, and e denotes an unknown 
signal to be estimated. Although an under-determined system has many solutions, usually people are more interested 
in the sparsest one for its succinctness in capturing the essential features of a phenomenon. The sparsest solution 
can be obtained by solving the following combinatorial optimization problem: 


min ||e|| 0 , s.t. y = Fe. (16) 

e 

To reduce the computational complexity, i\ relaxation method was introduced to provide an approximate solution 
to problem (I6|). The relaxed problem can be written as 


min ||e||i, s.t. y = Fe. 


(17) 


Recently, sufficient conditions have been proposed to guarantee the sparsest recovery ability of the above £ 1 
minimization problem. However, as mentioned before, these conditions may limit the applicability of the i\ 
relaxation technique to various practical problems. 

To avoid imposing conditions on the measurement matrix F, we want to apply an SIT to achieve a solution 
equivalence between l 0 and i\ minimizations in the under-determined setting. To utilize previously derived results, 
we convert problem ( p~6] > into an error detection problem. To start the conversion, we first compute a least-mean- 
squares solution, y = F + y , to the under-determined system y = Fe. F + denotes the pseudo-inverse of F. Assume 
the dimension of the kernel of F is r. Then we introduce a matrix A of n x r, whose range is the kernel of F. 
Therefore, any vector belonging to the kernel of F can be represented as Ax with x £ R r a coefficient vector. 


Finally, as one can verify problem (16) can be equivalently converted into an error detection problem as below: 


min 11 2 / - Ax\\ 0 . (18) 

X 

Observe that this problem is the same as Problem (to), therefore the corresponding S IT-ti minimization can be 
written as 


min ||— $Ax||i, 

X 

where $ is an orthogonal SIT. It follows from the SIT definition that $ should satisfy 

(con 1) $/ = /,V/ e spandA.y]) 1 -. 

According to proposition 2, S IT-( | minimization problem succeeds to recover a sparsest solution if $ further satisfy 


condition (10), that is, 


(con 2) = a. 
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Since y £ span(F T ) and span(A) = span(F T ) ± , the columns of A are orthogonal to y. As a result, u r+ \ in 
(con 2) can be set as u r .|_i = y^-. 

On the other hand, as one can verify, the above SIT-f) minimization problem is equivalent to the following 
problem: 

min 11e11 1 , s.t.y = F$ T e. (19) 


Problem (191 actually formulates an SIT based minimization for a problem of seeking a sparsest solution to an 
under-determined system. 

In conclusion, a sparsest solution to problem ( fl6| ) can be determined by problem ( [T9| ) provided that an orthogonal 
$ satisfies (con 1) and (con 2). In fact, we can derive a vector decomposition representation of <i>F r according 
to (con 1) and (con 2). First expand matrix F as follows: 

F=[f 1 ,...Jmf, 


where £ R n denotes the transpose of the z-th row vector of F. Each f , is actually orthogonal to span(A). Since 
Fu r +\ 0, we decompose each /j as follows: 

fi — (fi T [ fi (fi ttr+1) ttr+l] > ^ ^ [^j ■ 

Obviously, the second term belongs to span^A, y]) ± , according to ( con 1), one gets 

$[/i - UlUr+i)u r+ i] = [fi - (/TV+iH+i]- 


According to (con 2), one gets 

= (fl'u r + l)a + [fi - {ff u r+1 )u r+ l\. 


In short, we have 

®F t = diag(Fu r+ i)A + F T — diag(Fu r+ \)U , 
where A = [a, ...,a] and U = [u r + 1 , ...,u r - |_i], both A and U have m identical columns. 


( 20 ) 


From (20), we can see to the determination of T F ' can be reduced to a problem of determination of a. If the 
exact value of a is derived in some way, then a sparsest solution can be easily obtained by solving the corresponding 
SIT-/'-| minimization problem. 


V. Randomized Algorithm Based on Monte Carlo 

Previous discussions imply that finding a sparsest solution via Sfl-f i minimization is equivalent to determining 
a feasible choice of a. Unfortunately, we have not found a tractable way to obtain a feasible choice for a so far. 
Instead we propose a randomized algorithm based on the Monte Carlo method to search for one. The randomized 
algorithm may be inefficient when the data dimension is high, but it allows for parallel computation for speeding 
up. 
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Let U r denote an orthobasis of span(A). By applying a singular value decomposition on A we get A = II £ V. 
Since the rank of A is assumed to be r, U r can be achieved by assigning the first r columns of U to it. Assume 
y ^ span(A), then there exists a unit vector u r +\ that makes U r +i = [U r ,u r + 1 ] be an orthobasis of span([A,y]). 

Noted that a belongs to span(U r+ i) and has its £ 2 norm equal to 1. Denoted by A the set of feasible choices 
of a that satisfy condition 0. Let 

B = {f | / = C/ r T +1 a,||a|| 2 = l,ae^}. 


If we generate an / £ R r+1 with i.i.d. Gaussian entries and normalize its £ 2 norm to one, then a random a can 
be generated by the equation a = U r+ if. Obviously, a random / belongs to B with a certain fixed probability. If 
we generate enough /: 2 normalized random /, the event that at least one of these random vectors will generate a 
feasible value of a can happen with a high probability. 

To ease the implementation, we recommend to convert the sparsest error detection problem into a problem 
of finding sparsest solutions to an under-determined system. Hence we can avoid dealing with <1> A since this 
term varnishes after conversion. The conversion procedure is as follows: Since C / r+ 1 contains an orthobasis of 
span{\A,y\), y can be represented as: 


y = [U r ,u r+ 1 ] 


t 


with t £ II and ;3 £ R r . Moreover, for any x £ R r , we have Ax = U r h for some h £ R r . As a result. Problem 
(To) can be re-written as: 


min || [U r , Mr+i] 


— U r h\\o. 


Set z = h + /?, then the above problem can be further simplified to 


min ||itj J . +1 — U r z\\o- (21) 

On the other hand, set F £ Ji("" r ) x " as 

F — ..., , 

where {t( r +i, ...,u n } denotes an orthobasis of span{A)^. So we have FU r = 0 and thus we get 

tFu r+ i — FU r z = tFu r+ 1 = [t, 0,..., 0] T . 

Then an equivalent representation of problem ( pi) can be written as 

min ||e||o, s.t. Fe = [t, 0,..., 0] T . (22) 

e 

Thus we finish the conversion. 

According to the discussions in the last section, we can see, the corresponding SIT-Li minimization formulation 
to problem p2) can be written as: 

min ||ell 1 , s.t. F<b T e = [t, 0,..., 0] T , 

e 
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where $ is required to satisfy 


<&Ui = m,i € {r + 2,..., n}, 


(23) 


and 


$M r+ i = a. 


Based on (23) and (241, we get 


i • ••; tin] — [a, ti r _|_2 ; •• •> tX n ]. 


In conclusion, we can formulate the SIT-fi minimization for problem (22 1 as below: 


(24) 


min ||e||i, s.t. a T e = t, [u r+2 , ■■■, u n ] T e = 0. (25) 

e 

We have verified that SIT-f'i minimization can be applied to obtain a sparsest solution to an under-determined 
system in the last section. The next step is to present how to apply a randomized algorithm to solve the SIT-f i 
minimization problem. The basic idea of the randomized algorithm is as follows: First we generate many random 
vectors a. For each generated random vector a, we can adopt a fast and accurate first-order method fTT| , the linear 
programming method (12), or many other algorithms to obtain a solution to problem ([25]). Ignoring differences in 
magnitude, the sparsest one of all these solutions can be taken approximately to be the final sparsest error vector. 
If the number of random vectors a is large enough, then the event that the randomized algorithm will eventually 
return a truly sparsest error vector happens with a high probability. 

Algorithm 1 presents the framework of the proposed randomized algorithm for the sparsest error detection. This 
program was implemented via Matlab. In algorithm 1, ”SNbr” represents the number of the randomly generated 
values for a, the operation ”svd(A)” implements the singular value decomposition on A , U(:,n : m) denotes a 
sub-matrix formed by selecting columns from n to m of U, [cc] , denotes a new vector achieved by setting the 
negative entries of x to zeros whereas the positive entries remain the same, and the notation ” o ” denotes the 
Hadamard product, i.e., the matrix obtained by entry-by-entry multiplication. 

Due to the limited computational accuracy of Matlab, most entries of a computed sparse vector may not be 
exactly zeros. To mitigate this problem, we set a positive threshold e in algorithm 1, which is used to set entries 
with very small values (smaller than the e) to zeros. 


VI. Numerical Study 

In this section, we present a series of experiments to demonstrate the effectiveness of the SIT-£i minimization in 
sparsest error detection. In the first subsection, we aim to provide readers a quick impression of the performance 
of the proposed method to robust linear regression. In the second subsection, we aim to highlight the statistical 
performance of the proposed method for sparsest error detection involving high dimensional data sets. In these two 
subsections, we use Af(a, b) to denote a Gaussian distribution with mean a and standard deviation b. 
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Algorithm 1 A Randomized Algorithm for Sparsest Detection 

Require: y£R m ,A& R mxn , e and SNbr 

Ensure: 

l: Compute u r+ i,U r ,U£ +1 : 


U r = Ui(:,l : r) where \U\,S\,V\[ = svd(A); 

«r+l = \\Zr+i\\2 where “r+i = y ~ UrlUjUr)- 1 ^y\ 

U° +1 = U 2 {:,r + 2 : to), [t/ 2 , 5 2 , C 2 ] = svd([A,y]). 

2: Compute y*: 

y* = [t, 0,0] T G R m where t = u^ +l y. 

3: Randomly Generate a: 

generate / G R r+1 by i.i.d Gaussian distribution; 


a = U r+ if where / 



4: Adopt Basis Pursuit Algorithm: 
for i = 1 : SNbr 

generate a by step 3; 


set F = [a, Z7 r c +1 ] T ; 


e-i = arg min e ||e||i, s.t. y* = Fe; 


e-i = sign(ei) o 
i = i + 1. 


led - el 


+ 


end for 

5: Find the sparest solution e: 

e = axgmin ee{ g l! ... i g SJV(>r} ||e|| 0 . 

6: return e 


A. Robust Linear Regression 

Linear regression is an important methodology in statistics. It aims to find the hidden linear relationship between 
the outputs and the inputs. Traditionally, people apply the least mean squares to achieve an estimation for the hidden 
signal. However, the least mean squares method usually lacks robustness, especially in cases where there are many 
outliers among the measurements. Recently, researchers are increasingly interested in robust methods, which are 
expected to have high resistance to outliers. 

The proposed SIT-Gi minimization is a strong candidate for robust regression. To demonstrate this, we conduct 
comparisons between the proposed SIT-/ | minimization and a popular robust regression method — the least absolute 
deviation (LAD). In fact, the LAD method is a direct i\ minimization method. 

The setup for the robust linear regression experiments is as follows: The entries of the hidden parameter vector 
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(b) 


Fig. 2. Comparing Performance with a Uniform Outlier Distribution: Estimations of y obtained from different algorithms are compared against 
the exact value. a(i) denotes the x-coordinate of the z-th blue point, whose ^/-coordinate represents the actual measurement. The blue line 
represents the error-free y value, the red line represents the estimated value according to SIT-^i minimization, and the green line the estimated 
value achieved by LAD. The noise level is relatively small in figure (a) and relatively high in figure (b). For both cases, the estimations provided 
by the proposed method are more precise than those provided by LAD. 


x £ R 2 are generated by a Gaussian distribution A/(0.1). The measurement matrix A is set as: 


A = [a,l], 

where 1 £ R 52 is a column vector whose entries are all ones. In Fig.2, a £ R 52 is generated as a(i) = i for each 
i in [0,56]. In Fig.3, a is generated as 

i, i < 32 

i + 20, 
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Fig. 3. Comparing Performance with a Tailored Outlier Distribution: Estimations of y obtained by different algorithms are compared against 
the exact value. a(i) indicates the x-coordinate of an z-th blue point, whose y-coordinate represents the actual measurement. The blue line 
represents the error-free y value, the red line represents the estimated value according to SIT-^i minimization and the green line the estimated 
value achieved by LAD. This result suggests that the proposed method has a much higher resistance to arbitrarily distributed errors than the 
LAD method. 


The number of the non-zero entries of the error vector, e £ R 52 , is set to be 20. Moreover, if the i -th entry of e, 
e(i), is non-zero then e(i) is set to be 20. To make the simulation study more realistic, we include additional small 
additive Gaussian noises to the measured data y. Therefore, the final y is generated by the equation y = Ax + e + z 
where z denotes a Gaussian noise vector. In Fig.2 (a) and in Fig.3, entries of z are generated by the Gaussian 
distribution Af(0, 0.5) and the threshold e is set to be 0.3. Whereas in Fig.2 (b), entries of z are generated by 
7V(0,2) and e = 1.8. In Fig.2 and Fig.3, SNbr, the sample number, is set to 80. The task of the corresponding 
linear regression problem is to estimate the hidden parameters (i.e., the entries in x) as accurate as possible when 
given y and A. Therefore we can apply algorithm 1, which is presented in the last section for SIT-f'i minimization, 
to get an estimation of the real hidden parameters. Since Gaussian noises are included in the measurements, we 
replace the basis pursuit algorithm with the basis pursuit denoising algorithm jT3j in algorithm 1. 

Fig.2 shows the SIT-G minimization is more robust than the LAD method under different levels of Gaussian 
noises. Fig.3 presents a set of data based on a tailored error distribution to highlight the weakness of direct l\ 
minimization. The results indicate that it totally loses the ability to distinguish outliers for the data set presented, 
whereas the performance of the proposed method is resistant to this change in error distribution. 

B. Statistical Performance of the SIT-i\ Minimization 

It has been shown that direct £ 1 minimization is efficient in detecting the sparsest errors when the measurement 
matrix is generated at random with i.i.d. entries 0, 0. However, there are situations where direct i\ 
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Fig. 4. Comparison under Different Error Distributions: Different error distributions were obtained by varying t from 0.6 to 1. The graph 
shows the ratio of correct detections for a given method, which is defined as the proportion of the cases the method correctly predicts whether 
there is an error or not in the measurement. 


minimization fails. SIT-fi minimization offers a feasible alternative for these cases. To demonstrate this, we 
generate a measurement matrix with entries that are random but not independently identically distributed. For 
this measurement matrix, the comparison shows that direct l\ minimization fails to detect arbitrarily-distributed 
sparse errors whereas the proposed method remains effective. 

In order to demonstrate this, we only need to generate a random matrix A by two Gaussian distributions with 
different means. We set the row number of A to 256. Given an integer k £ (0, 256), let each entry in the first 256 — k 
rows of A be generated by A/”(0,1), whereas entries in the remaining rows of A be generated by A/"(5,1). To test 
whether both SIT-fi minimization and direct t\ minimization can detect sparse errors with arbitrary distributions, 
we set up a series of error distributions as follows: Let s denote the number of the occurring errors, that is, ||e||o = s, 
given a real number t £ [0,1], let t * s be number of errors which appear among the first 256 — k entries of e. These 
errors are distributed uniformly. Similarly, let the rest of the errors appear uniformly among the last k entries of e. 
We can obtain different error distributions by varying t in [0,1]. Moreover, we set e(i) = 10 if e(i) ^ 0. and we 
generate each entry of x by A/(0.1), then the final y can be realized by y = Ax + e. In the following experiments, 
unless otherwise stated, we set e = 0.2. 

In Fig. 4, we set s = 36, r = 16, ( r is the rank of A ,) k = 56 and SNbr = 100. The graph compares the 
proposed method with two popular methods, direct i\ minimization and reweighted i\ minimization Jl6) , under 
different error distributions parameterized by t. For each t, we test 200 trials and get a ratio of correct detection for 
each method. From Fig.4, we can see, when t > 0.75 both direct l\ minimization and reweighted i\ minimization 
frequently fail to detect the sparse errors whereas the performance of the proposed method remains unaffected by 
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Fig. 5. Detection Accuracy versus Sample Number: The graph shows the ratio of correct detections for a given method (as defined in 
Fig.4) against the number of samples. A turning point appears around SNbr = 20. Generally speaking, the higher the SNbr, the better is the 
performance of the proposed algorithm. 



Fig. 6. Detection Accuracy versus Rank of A: The graph shows the ratio of correct detections for a given method (as defined in Fig.4) against 
the rank of A. As the rank increases, the ratio of correct detection decreases. 


the change of error distribution. 

In below, we aim to demonstrate how the detection accuracy of the proposed method varies with SNbr, r and 
s respectively. For simplicity, we set k = s and t = 1. Under such a parameter setting, we observe that both the 
direct i\ minimization and the re weighted t\ minimization fail to detect all the sparse errors exactly. Hence, the 
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Fig. 7. Detection Accuracy Versus Number of Errors: as the error number increases, it becomes harder for the proposed method to detect all 
the errors correctly. 


following figures only present the performance of the proposed method. We test 200 trials for each value setting 

of ( SNbr , r, s). 

Generally speaking, the larger the SNbr , the higher probability for the proposed algorithm to detect all the 
sparse errors. But the time to search for the solution is shorter when the value of SNbr is smaller. For Fig.5 we set 
r = 16 and s = 36. The graph shows that if SNbr is larger than 20, then the proposed method can achieve a good 
performance in error detection. 

In Fig.6 we set s = 36 and SNbr = 40. The graph shows that the detection accuracy of the proposed method 
decreases as the rank of A increases. But if SNbr increases with the rank of A, the detection accuracy may improve. 

In Fig.7, we set r = 16 and SNbr = 120. The graph implies a very high error-detection capability of the 
proposed methodology since it stops to detect all the sparse errors exactly only when the corrupted measurements 
get close to half of the total measurements. 


VII. Conclusion 

This paper presents a new methodology—SIT-fi minimization for sparsest errors detection of an over-determined 
linear system. The basic idea is to use a Sparsity Invariant Transformation (SIT) to transform the original linear 
system into a representation in which the /(j-minimal error vector coincides with the l\ -minimal error vector. A 
contribution of this paper lies in showing the existence of such an SIT for a general over-determined linear system. 
Moreover, this methodology can be applied to sparsest recovery of an under-determined linear system. Compared 
to the methodology proposed in compressed sensing, an advantage of the proposed methodology is the removal 
of structure constraints on the measurement matrices. So far, there is no known efficient algorithm to construct a 
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feasible SIT for a linear system. Instead, we provide a randomized algorithm based on the Monte Carlo simulation 

to search for one. The numerical results demonstrated the performance improvement of the proposed method in 

comparison with direct t\ minimization and reweighted l\ minimization. 
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